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• Premise of the study: Next- generation sequencing (NGS) data are widely used for single-nucleotide polymorphism discovery 
and genetic marker development in species with limited available genome information. We developed microsatellite primers 
for the Proteaceae nut crop species Macadamia integrifolia and assessed cross-species transferability in all congeners to inves- 
tigate genetic identification of cultivars and gene flow. 

• Methods and Results: Primers were designed from both raw and assembled Illumina NGS paired-end reads. The final 12 mic- 
rosatellite markers selected were polymorphic among wild individuals of all four Macadamia species — M. integrifolia, 
M. tetraphylla, M. ternifolia, and M. jansenii — and in commercial macadamia cultivars including hybrids. 

• Conclusions: We demonstrate the utility of raw and assembled Illumina NGS reads from total genomic DNA for the rapid 
development of microsatellites in Macadamia. These primers will facilitate future studies of population structure, hybridiza- 
tion, parentage, and cultivar identification in cultivated and wild Macadamia populations. 
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Macadamia is a recently domesticated nut crop derived from 
the Australian subtropical rainforest species Macadamia integ- 
rifolia Maiden & Betche and M. tetraphylla L. A. S. Johnson 
and their hybrids. Within the genus, all species, including 
M ternifolia F. Muell. mdM. jansenii C. L. Gross & P. H. Weston, 
are under threat of genetic erosion (Mast et al., 2008; Costello 
et al., 2009). Commercial cultivars were developed primarily in 
Hawaii and are only a few generations removed from Australian 
wild progenitors (Hardner et al., 2009). Macadamia are prefer- 
entially out-crossing and take four to five years to reach matu- 
rity. For breeding programs to progress effectively, there is a 
need to discriminate among clonally propagated industry stan- 
dard cultivars and novel selections well before maturity. Al- 
though the 17 available M. integrifolia microsatellite markers 
with perfect repeats were tested in our laboratory (Schmidt 
et al., 2006), only four amplified successfully. These results are 
consistent with previous research on M. integrifolia (Neal, 
2008), and no published study has used more than four poly- 
morphic markers (Shapcott and Powell, 2011; Spain and Lowe, 
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2011). Additional microsatellite markers are needed to support 
conservation studies and breeding programs. 

Next-generation sequencing (NGS) platforms are now rou- 
tinely used for isolation of microsatellite, or simple sequence 
repeat (SSR), loci from plants (Egan et al., 2012). Long-read 
platforms are commonly used because reads of 300 to 500 bp in 
length may contain both the SSR motif and flanking sequence 
for primer design (Zalapa et al., 2012). Together, paired-end 
reads from short-read platforms also contain the SSR motif and 
flanking sequence for primer design at a lower cost per base 
(Silva et al., 2013). The aim of this study was to develop poly- 
morphic microsatellite markers for Macadamia using paired- 
end Illumina reads with and without prior de novo assembly. 



METHODS AND RESULTS 

Fresh leaf material was collected from macadamia nut cultivars at Clunes 
Varietal Trial M2, Clunes, New South Wales, Australia (Stephenson and 
Gallagher, 2000). Additional cultivars and clones of wild-collected individuals 
of all four Macadamia species were sourced from the Australian Macadamia 
Germplasm Collection at Alstonville Tropical Fruit Research Station, NSW De- 
partment of Primary Industries. Herbarium material is deposited at the Southern 
Cross University Medicinal Plant Herbarium (PHARM), Lismore, New South 
Wales, Australia (Appendix 1). Fresh leaf material was stored at -80°C (for 
Illumina sequencing) or after collection dried in a sealed container with lOx 
silica gel by fresh weight. Total DNA was extracted using a QIAGEN DNeasy 
Plant Kit (QIAGEN, Valencia, California, USA) according to manufacturer's 
protocols. Approximately 4.5 jig of DNA extracted from one individual of 
M. integrifolia was submitted to the Australian Genome Research Facility, Mel- 
bourne, for sequencing. A DNA library was prepared with an Illumina TruSeq 
Sample Preparation Kit (version 2) following the manufacturer's instructions 
(Illumina, San Diego, California, USA). Genomic DNA was sheared using a 
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Table 1. Characterization of 12 polymorphic micro satellite loci developed in Macadamia integrifolia.^ 



Locus 




Primer sequences (5 '-30 


Repeat motif 


Fluorescent label 


Allele size range (bp) 


7;(°c) 


GenBank accession no. 


MacOOl 


F : 
R: 


GTGACTGGTGGACACCAAAACCCA 
GCACTAGGTGTCACCCCCACTTCT 


(AT) 11 


VIC 


412-420 


60 


KF130888 


Mac002 


F : 
R: 


CCCAACTGGGTTTGCAAGGACCAA 
AGTAGCCGCGAGCTGATCGAAGAT 


(CT)8 


NED 


283-297 


60 


KF130889 


Mac003 


F : 
R: 


TGGACCATTGAGGAGTTGGACTGT 
TCCACCGTTTCACTTTCGTCAGCC 


(AT)9 


FAM 


258-276 


60 


T^^T"" 1 ^ f\CiC\f\ 

KF130890 


A >r f\f\ A 

Mac004 


F : 
R: 


CAAGAGTGTCCAGCGAGGGAATGC 
GGGAGACATCATACTTTTGACACATGCC 


(AT)ii 


NED 


224-240 


60 


KF 130891 


Mac005 


F : 
R: 


CATAGCATGAGTTTCAAGGGATAA 
ATTACAAACCCACTCTTCGATTT 


(AAG)io 


FAM 


331-343 


60 


KF130892 


Mac006 


F : 
R: 


TTTCATCATTGATCATCATAGGTACA 
GAGCTAATACTTAACCAGGTGAACA 


(AG) 11 


PET 


322-360 


55 


KF130893 


MacOOV 


F : 
R: 


AGGCCTTGGGATGTTCCAGTGTGA 
GC AAT C AAC AC AAGC AC C T GT GGC 


(CT)ii 


NED 


368-390 


60 


KF130894 


MacOOS 


F: 
R: 


AACGGTTATGTCAAGTGCAACAGGA 
TGACTTTAGCCCTCACTTCAAAGCCA 


(AT)io 


FAM 


388-398 


60 


KF130895 


Mac009 


F: 
R: 


CAACTCTCTCTCCCTCAGATTCTC 
TAAATCTATGCCACATCACTAGGC 


(AAG)i3 


VIC 


241-244 


60 


KF130896 


MacOlO 


F: 
R: 


GC AAC T GGAT C AGC AC AT AAGAAT 
GTAATTATCCCAACAGAACCCAAT 


(AG)ii 


PET 


259-297 


55 


KF130897 


MacOll 


F: 
R: 


AGAGGGCGAGATCCCTGACTCTGA 
TGAAATTTGGCGTGGGGAAAGCGT 


(CT)9 


FAM 


175-199 


60 


KF130898 


Mac012 


F: 
R: 


TAT C AGGAC CAT C AAC AAT GAT T T 
GCCTGTTGTAGGTAAAGTGGAGAT 


(AC)io 


VIC 


309-321 


60 


KF130899 



Note: = annealing temperature used for all Macadamia species and cultivars. 

^Values based on 22 samples representing Macadamia cultivars located at Clunes Varietal Trial M2, New South Wales, Australia. 



Covaris S2 sonication device (Covaris, Woburn, Massachusetts, USA). DNA 
fragments were end-repaired, A-tailed, and ligated to adapters. Size and con- 
centration of DNA fragments were assessed using a DNA 1000 chip on a Bio- 
analyzer 2100 instrument (Agilent Technologies, Santa Clara, California, USA). 
Average insert size of the library was 424 bp. Approximately 4 pmol of the 
library was paired-end sequenced (100 x 2 cycles) on an Illumina Hi-Seq 
2000 instrument. 

Paired-end reads were imported into CLC Genomics Workbench (version 
4.9; CLC Bio, Aarhaus, Denmark) and trimmed to remove low-quality base 
calls (<Q20; P < 0.01) and adapter sequences. For the purpose of primer de- 
sign, reads containing SSR motifs were identified as follows. Raw sequence 
reads: the search function was used to identify di- and trinucleotide SSR mo- 
tifs with a minimum of eight repeats in raw sequence reads. SSR regions were 
identified at the 3'-end of a read. Primers were then designed in the flanking 
regions (i.e., 5'-end of read containing SSR) and in the matching paired-end 
read. De novo contigs: trimmed reads were assembled de novo with the following 
parameters: similarity index = 0.8; length fraction = 0.5; insertion/deletion 
cost = 3; mismatch cost = 2. Contigs were screened for SSR regions using the 
search function described above. To develop and optimize a suite of SSR 
markers for cultivar identification and gene flow studies, primers were de- 
signed for 48 loci, 24 for each method using a batch function in Primer3 ver- 
sion 2 (Rozen and Skaletsky, 2000) specifying a primer melting temperature 
(Tj^) range 58-70°C, maximum difference 5°C, and primer GC content 
40-60%. To minimize the cost of primer synthesis during the testing phase, 
one primer from each pair was 5' modified with an engineered sequence 
(5'-CCCCCGGGGGC-30 to enable the attachment of a third primer that was 
fluorescently labeled using a two-step PCR protocol (Pacey-Miller and Henry, 
2003). Primer pairs were tested for amplification success and polymorphism 
among 12 DNA samples including eight M. integrifolia cultivars and one in- 
dividual from each Macadamia species. Of the 48 primer pairs tested, six did 
not amplify and seven produced multiple bands. Of the remaining 35 loci, 
none were monomorphic, with two or more alleles detected among the 12 test 
individuals. Primer sequences for these loci are available on request from the 
author. 

Twelve micro satellite loci were selected for further development on the 
basis of single band amplification, level of polymorphism, and size compati- 
bility for pooled multilocus capillary electrophoresis. The 5' end of one of 
each primer pair was fluorescently labeled (Table 1) and the following single- 
step PCR protocol was used: in 20-jiL reaction volumes containing approxi- 
mately 20 ng DNA template, 0.5 U Platinum Tag (Life Technologies, 
Carlsbad, California, USA), 2 |iL Platinum Tag PCR buffer, 0.1 mM dNTPs, 



2 mM MgCl2, 0.2 jiM of each primer, and sterile water to 20 jiL. Thermal 
cycling was conducted in a GeneAmp PCR System 9700 (Life Technologies) 
with the following conditions: initial denaturation at 94° C for 2 min; followed 
by 35 cycles of 94°C for 10 s, annealing temperature (rj (Table 1) for 10 s, 
extension at 70°C for 1 min; followed by final extension at 70°C for 5 min. 
Genotypes were generated using an ABI PRISM 3730 Genetic Analyzer (Ap- 
plied Biosy stems, Foster City, California, USA). Allele size was scored in 
reference to ABI PRISM GS (LIZ) internal size standards using the program 
Geneious version 6.1.6 (Biomatters Ltd., Auckland, New Zealand). We as- 
sessed variability and genotype consistency of the 12 loci in 22 macadamia 
cultivars (two to four replicate trees of each) including pure M. integrifolia 
and hybrids. The loci were also tested for cross- amplification in wild-collected 
individuals of M. integrifolia (n = 6), M. tetraphylla (n = 7), M. ternifolia 
(n = 2), and M. jansenii (n = 2). 

After trimming, there were 245,099,904 reads, with an average length of 
91.57 bp. We identified 2.29 million reads containing di- and trinucleotide SSR 



Table 2. Genetic properties of 12 micro satellite loci in Macadamia 
integrifolia and hybrid industry cultivars, and M. tetraphylla. 



Macadamia cultivars, Clunes M. tetraphylla, northern 

Varietal Trial M2 (n = 22) NSW (n = 7) 



Locus 


A 


Ho 


He 


A 


Ho 


H, 


MacOOl 


5 


0.545 


0.676 


5 


0.714 


0.704 


Mac002 


5 


0.591 


0.596 


3 


0.167 


0.653 


Mac003 


7 


0.667 


0.683 


5 


0.857 


0.714 


Mac004 


7 


0.364 


0.762 


6 


0.429 


0.796 


Mac005 


4 


0.591 


0.654 


2 


0.429 


0.337 


Mac006 


9 


0.864 


0.776 


9 


0.857 


0.847 


Mac007 


6 


0.682 


0.653 


4 


0.857 


0.684 


Mac008 


5 


0.364 


0.351 


3 


0.429 


0.357 


Mac009 


2 


0.091 


0.165 


2 


0.143 


0.133 


MacOlO 


8 


0.909 


0.702 


5 


0.714 


0.724 


MacOll 


7 


0.864 


0.800 


9 


0.714 


0.837 


Mac012 


6 


0.318 


0.674 


6 


0.571 


0.796 


Mean 


5.917 


0.571 


0.626 


4.917 


0.573 


0.632 



Note: A = number of alleles; = expected heterozygosity; = observed 
heterozygosity; n = number of individuals sampled. 
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Fig. 1. Principal coordinate cluster plot based on genetic distance among multilocus genotypes for Macadamia integrifolia (white), M. tetraphylla 
(red), M. ternifolia (blue), M.jansenii (yellow), and macadamia cultivars (black). First and second coordinates explain 35.19% and 1 1.36% of the variation, 
respectively. 



motifs with a minimum of eight repeats. Amplification success at 60°C anneal- 
ing temperature was identical (87.5%) for primer pairs from unassembled reads 
and de novo assembled contigs. Genetic diversity parameters and principal co- 
ordinate analysis (PCoA) were calculated using GenAlEx version 6.5 (Peakall 
and Smouse, 2006, 2012) (Table 2). 

All 12 loci amplified and were polymorphic among 22 cultivars. Mean 
observed (//q) and expected (H^ heterozygosity were 0.571 and 0.626, re- 
spectively. A total of 71 alleles were detected, with an average of 5.9 per 
locus (Table 2). Unique genotypes were obtained for each cultivar with the 
exception of Hawaiian Agricultural Experiment Station (HAES) 741 and 
660 that shared 24 of 24 alleles. Selection records for these two cultivars are 
the same, suggesting that they may have been sourced from the same tree at 
different times. Genotypes from replicate trees of cultivars were consistent, 
with the exception of one of three HAES 791 trees that is presumed to be a 
misidentification as its genotype was identical to HAES 344. In M. tetra- 
phylla, 59 alleles were found, with an average of 4.9 per locus. Mean and 

were 0.573 and 0.632, respectively (Table 2). All loci amplified reliably 
in sampled wild M. integrifolia and M. tetraphylla individuals, and were 
polymorphic with the exception of Mac009 in M. integrifolia. Locus Mac005 
in M. jansenii and MacOOl in M. ternifolia did not amplify. The remaining 
1 1 loci amplified in M. jansenii and M. ternifolia, and eight were polymor- 
phic in two individuals of each of these species. Species-specific clusters 
were generated by two-dimensional PCoA based on genetic distance. Most 
cultivars clustered with wild M. integrifolia individuals, although hybrid 
cultivars such as A4 and A16 were intermediate between M. integrifolia and 
M. tetraphylla (Fig. 1). 



CONCLUSIONS 

The micro satellite markers developed here enable dis- 
crimination among macadamia industry cultivars and will be 
used to select parental genotypes in breeding programs. Cross- 
amplification and polymorphism of the markers in all Macada- 
mia species will facilitate studies of population structure, gene 
flow, and hybridization. In this work, we demonstrate the ef- 
fectiveness of Illumina NGS paired-end sequence reads for 



rapid and cost-effective microsatellite development with and 
without prior assembly of reads. 
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Appendix 1 . Voucher information for Macadamia species used in this study. 



Species 


Voucher specimen accession no.^ 


Collection locality 


Geographic coordinates 


M. jansenii 

M. tetraphylla 

M. ternifolia 

M. integrifolia 

M. integrifolia, cultivar 741 


PHARM-13-0809 
PHARM-13-0810 
PHARM-13-0811 
PHARM-13-0812 
PHARM-13-0813 


Bulburin National Park, Queensland, Australia 

MulUumbimby, northern New South Wales, Australia 

Draper, Queensland, Australia 

Villeneuve, Queensland, AustraHa 

Clunes Varietal Trial M2, New South Wales, AustraHa 


24°37.584'S, 151°33.29rE 
28°32.835'S, 153°25.455'E 
27°21.268'S, 152°54.965'E 
26°58.384'S, 152°38.899'E 
28°43.844'S, 153°23.699'E 



^Vouchers deposited at Southern Cross University, Medicinal Plant Herbarium (PHARM), Lismore, New South Wales, Australia. 
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